16. Lesson Recap

EHR Data Security and Analysis Lesson Recap

ND320 AIHCND C01 L01 A14 Lesson Recap V2

EHR Data Security and Analysis

You have made it to the end of EHR Data Security & Analysis! Well done.
You now know about Data Security and Privacy including some of the key standards and regulations. You have gained key information that you need to know about working with EHR data that will protect your data and you.

You also explored your way through data allowing you to gain a deeper understanding of your datasets including analyzing a dataset schema, looking at value distributions, missing values, and the cardinality of categorical features. It was a great adventure, wasn't it? I hope you took some pictures! Oh, wait maybe that wouldn't be a great data security practice.

And finally, you completed a demographic dataset analysis. You can now make sure that your dataset is representative before it would become a problem.

Once again, Congratulations! You're be ready to work with real healthcare datasets. You are also able to ensure that your training and predictions of the model you build later are truly representative. The next lesson is filled with the intrigue and mysteries surrounding…EHR Code Sets. I bet you didn't guess that one. I'll see you in the next lesson where you'll learn all about dealing with Diagnosis, Procedure, and Medication Codes.

EHR Data Security and Analysis

EHR Data Security and Analysis

Lesson Key Terms

Key Term Definition
Providers These are entities/groups/individuals that provide care for patients. Providers can range in size from entire hospital networks to individual doctors and medical professionals.
Payers a group that consists of companies like healthcare insurance. Payers can also include entities such as the government for things like Medicaid and Medicare in the United States.
HIPAA The Health Insurance Portability and Accountability Act is the key industry regulation that you should be familiar within the U.S.
HITECH The Health Information Technology for Economic and Clinical Health Act is also important to note and this is really just an update to HIPAA that accounts for technology.
EU European Union
GDPR The General Data Protection Regulation is generally considered more stringent than even HIPAA when it comes to protections for patients.
DPA The Data Protection Act really builds off of and add to GDPR
PHI Protected Health Information
Covered Entities are a group of industry organizations defined by HIPAA to be one of three groups: health insurance plans, providers, or clearinghouses. You can see from the table the types of entities in each category.
Business Associates A business associate is a person or entity that performs certain functions or activities that involve the use or disclosure of protected health information on behalf of or provides services to, a covered entity.
Business Associates Agreement/Addendum (BAA) The contract between a covered entity and business associate.
De-identifying a Dataset The removal of identifying fields like name, address from a dataset. De-Identification is done to reduce privacy risks to individuals and support the secondary use of data for research and such.
Expert Determination Method Completed by a statistician to determines there is a small enough risk that an individual could be identified.
Safe Harbor The removal of 18 identifiers like name, zip code, etc.
EDA Exploratory Data Analysis
CRISP-DM This stands for “cross-industry standard process for data mining” and is a common framework used for data science projects and includes a series of steps from business understanding to deployment.
MCAR Missing Completely at Random. This means that the data is missing due to something unrelated to the data and there is no systematic reason for the missing data.
MAR Missing at Random and this is the opposite case where there is some systematic relationship between data and the probability of missing data.
MNAR Missing Not at Random and this usually means there is a relationship between a value in the dataset and the missing values.
Cardinality refers to the number of unique values that a feature has and is relevant to EHR datasets because there are code sets such as diagnosis codes in the order of tens of thousands of unique codes.